Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional Top Level Keys and Configuration Log Option #96

Merged
merged 6 commits into from
Apr 6, 2017

Conversation

jacknagz
Copy link
Contributor

@jacknagz jacknagz commented Apr 6, 2017

to: @airbnb/streamalert-maintainers
size: medium
resolves #95

Library Changes

  • Nest all conf/log type options under a new key called configuration
    • hints stays at the top level, since it's not an option, but rather a guard.
    • delimiter, separator move under configuration.
    • envelope is now called envelope_keys and nested under configuration.
    • records is now called json_path and also nested under configuration.
  • Create a new configuration key called optional_top_level_keys.
    • The goal of this option is to allow log schemas to have optional keys with specific types.
    • Sets default values for optional keys not in a specific incoming record.
  • Fix various pylint offenses.

Other Changes

  • Update unit tests
  • Update sample configurations
  • Update scripts
    • Add an autopep helper script
    • Update the unit_test.sh script to test the stream_alert_cli package
    • Enforce 80%+ code coverage for this repo

Notes

  • I will update documentation once the proposed changes are accepted.
  • This will break most existing conf/logs.json settings if the keys are not updated as described above.

Copy link
Contributor

@austinbyers austinbyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments inline

return bool()
elif key == []:
return list()
elif key == OrderedDict():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This smells funny. As you point out, lists and dicts cannot be keys in a Python dictionary. So why do you allow them as your own keys? Why isn't it just key == 'list' and key == 'dict'?

Keep in mind, this will only match if the key is an empty OrderedDict. Do you mean to use type?

Finally, if key == OrderedDict(), you return dict(), meaning you've removed the ordering. Why the discrepancy in types?

Copy link
Contributor Author

@jacknagz jacknagz Apr 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@austinbyers it's not key == 'list' or 'dict' because in the schema's they are dictated as [] or {}. Since we load as an OrderedDict, empty {} is converted to OrderedDict(). It's mean't to be empty.

json_payload[key_name] = default_optional_values(value_type)

# Handle jsonpath extraction of records
if config_options and len(config_options) and records_schema:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The len check is redundant: if config_options returns True only if config_options is not None and is non-empty

'kinesis': {
'data': base64.b64encode(kinesis_data)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indentation doesn't any sense to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autopep did it 😢

jacknagz pushed a commit that referenced this pull request Apr 6, 2017
@jacknagz
Copy link
Contributor Author

jacknagz commented Apr 6, 2017

Thanks for the review @austinbyers, I just addressed your comments

@austinbyers
Copy link
Contributor

LGTM

Jack Naglieri added 6 commits April 6, 2017 12:22
Instead of having optional top level keys for a log type, move all options into 'configuration'.
This includes KV/CSV delimiter/separator, json_path, and envelope keys.
The goal of optional_top_level_keys is to provide a mechanism for a more flexible schema.
If the key(s) described in optional_top_level_keys exist in the incoming record, they are simply
type checked.  If they do not exist, a default value is created in place to avoid
rules calling keys that do not exist.
@jacknagz jacknagz force-pushed the jacknagz-top-level-keys-and-parser-config-opts branch from f1c67b9 to bd062ea Compare April 6, 2017 19:22
@jacknagz jacknagz merged commit 8ed2c18 into master Apr 6, 2017
@jacknagz jacknagz deleted the jacknagz-top-level-keys-and-parser-config-opts branch April 6, 2017 19:27
ryandeivert pushed a commit that referenced this pull request Apr 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improvement: Optional Top Level Keys in a Schema and Parser Option Overhaul
2 participants